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ABSTRACT 

Three linear equating methods for the common item 
non-equivalent populations design, a design often used in practice, 
are compared by an analytical method. The models include: (1) 
Tucker's 
methods; 
Woodrull 

score regressions and is based on a linear regression model. The 
Levine and congeneric methods make assumptions about true score 
regressions and are based on linear structural models. The analysis 
is graphically illustrated using data from actual test 
administrations. If groups differ greatly as shown by their 
performance in the anchor and if application of Tucker's equating 
method is not tenable, the disattenuated correlation between Y and V 
should be computed. If this disattenuated correlation is 
significantly less than unity, the Levine method should not be used, 
and the congeneric method becomes an appealing alternative. (SLD) 
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Abstraot 

Three linear equating methods for the common item nonequivalent 
populations design, a design commonly used in practice, are compared using an 
analytical method. The analysis is graphically illustrated using data from 
actual test administrations. Conclusions derived from the analysis which have 
implications for the practical application of these equating methods are 
discussed. 



Key Words: Linear Equating, Tucker* s Equating Method, Levine's Equating 
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Introduction 

Linear equating methods for the common item nonequivalent populations 
(CINEP) design are derived or discussed by several authors: Gulliksen (1950), 
Levine (1955), Angoff (1971,1982), Braun and Holland (1982), Kolen (1985), 
Wood'*uff (1986), and Kolen and Brennan (1987). Angoff (1971) refers to this 
design as design IV — nonrandom groups. Under this design, a new test X is 
given to group 1 and an old test Y is given to group 2, while a usually 
shorter anchor test V is given to both groups. The anchor test V may comprise 
a scoreable part of the tests and this is referred to as the inclusive anchor 
situation, or test V may not contribute to examinees' scores and this is 
referred to as the exclusive anchor situation. Two methods commonly used in 
practice for linear equating under the CINEP design are Tucker's equally 
reliable method (Gulliksen, 1950; Angoff, 1971, 1982; Kolen, 1985) and 
Levine' 3 equally reliable method (Levine, 1955; Angoff, 1971, 1982; Woodruff, 
1986). A third method recently introduced by Woodruff (1986) is called the 
congeneric method. Tucker's method makes assumptions about observed score 
regressions, while the Levine and congeneric methods make assumptions about 
true score regressions. Tucker's method is based on a linear regiession 
model, while the Levine and congeneric methods are based on linear structural 
models. The congeneric method is less restrictive in its assumptions than is 
Levine' s method, but as a consequence the congeneric method is slightly more 
difficult to implement in that it requires an estimate of the anchor's 
reliability. According to Angoff (1971), who cites Levine (1955) and Lord 
(i960). Tucker's method is most appropriate for situations in which the two 
groups show no more than small differences in mean and ariance on the anchor, 
while Levine' s method (and the congeneric method also) may accommodate larger 
differences so long as the true scores on the tests and anchor correlate 



Linear Equating 
3 

unity. The purpose of the present paper is to compare through analytical and 
empirical means the performance of the three methods as the covariance between 
the tests and the anchor varies. Klein and Jarjoura (1985) undertook a 
similar investigation using only empirical methods. They /oted that Levine's 
method was more sensitive than Tucker's to a lack of content balance between 
the tests and the anchor. The present study will suggest an explanation for 
their finding which indicates that the performance of the congeneric method 
should be more similar to the Tucker method than to the Levine method as the 
covariance between the test and anchor decreases. This result has practical 
implications for equating and these will be discussed. Real test data will be 
used to graphically illustrate these conclusions. 

Analysis 

The analysis will begin with the exclusive anchor situation. Later, it 
will be shown how the results for the exclusive situation easily generalize to 
the inclusive situation. For the three methods under consideration; Tucker's 
equally reliable method, Levine's equally reliable method, and the congeneric 
method, if the two groups do not differ in either mean or variance on the 
anchor, then all three methods reduce to Angoff's (1971) design I: random 
groups equal reliabilities method since no adjustment for group differences is 
necessary. If the g»"oups do differ in performance on the anchor, then the 
anchor differences are used to adjust for group differences on X and Y. The 
higher the correlation between V and X and V and Y the more likely that this 
adjustment is appropriate (Cook and Peterson, 1987; Angoff, 1987). It may be 
shown (Kolen and Brennan, 1987; Woodruff, 1986) that the following three 
parameters determine how these anchor group differences are incorporated into 
the equating for the Tucker, Levine, and conganeric mctnods respectively: 
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0 /o^ , 

yv V 

(o + o^)/io + 0^) , and 

yv y yv V 



0 /o^p 
yv V vv 



The above gamma parameters pertain to the old test Y administered in 
population 2. It the synthetic population (Braun and Holland, 1982) is 
invoked, then the equating requires that the gamma parametero be estimated for 
both the old and new tests. If the syrthetio nopulation is ignored 
(Gullil^sen, 1950; Woodruff, 1986; Kolen and Bren.ian, 1987), then the gamma 
parameters need only be estimated for the old test. For simplicity, this 
paper will ignore the synthetic population, but its conclusions apply equally 
to equating with the synthetic population. In practice, these parameters are 
usually estimated by the method of moments (Angoff, 1971, 1982; Woodruff, 
1986). 

To simplify the analysis, certain assumptions will be made which will 

always be satisfied in the practical application of these linear equating 

methods. They are: > > 0 and 0 ^ a ^ o o , the latter being 

y V yv y V ^ 

equivalent to 0 ^ p ^ 1 . In what follows o will be treated as a 

yv yv 

mathematical variable, but o^, o^, and p^^. will be treated as mathematical 

constants. Under classical test theory, p^^ ^ ^vv' * '^^^ present analysis 

allows the constant, Pyy-» to assume any value between zero and one. 

Focusing first on the Tucker method shows that Y^ is a linear function of 

with positive slope 1/a^ and zero intercept. Its minimum value of zero 

occurs when a =0, and its maximum value is a /o which occurs v/hen 
yv y V 

0 = a 0 . As a decreases, the Tucker method gives anchor pcroup 
yv y V o o I 

differences less weight in the equating process. This is a reasonable and 
desirable pr»operty, since, a:> was previously mentioned, group differences 
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between Y and X will usually be reflected by group differences on V largely to 

the extent that V correlates with Y and X. 

The second method to be analyzed is Levine's. The first derivative of 

Y. is dY, /do = (o^ - )/(o + o^)^ < 0 . Its second derivative is 
L L yv V y yv V 

dY?/do^ = 2(0^ - o^)/(o + a^)^ > 0 . Hence, Y, is a decreasing function 
L yv y V yv V ' L ^ 

of with upward concavity. Furthermore, Y^^ has a minimum value 

of 0 /o when 0 =00, and a maximum value of 0^/0^ when a =0. Since 
y V yv y V y V yv 

the minimum value of Y^ coincides with the maximum value of Y^, Y^^ ^ Y^ • 

As 0^^ decreases, the Levine method gives anchor group differences more weight 

in the equating process. This is an unreabonable and undesirable property, 

but recall that the Levine method assumes that p(T ,T ) = 1 which implies 

y V ^ 

V V 
that p = (p ,p ^ which in turn implies that 0 = 0 0 (p ,p ,) ^ . The 
yv yy vv ^ yv y v yy vv 

above analysis reveals that the Levine method will perform poorly when this 

assumption is violated. 

Focusing, finally, on the congeneric method, Y^ has behavior similar to 

Y_ . It is a linear function of 0 as is Y«, but it has a steeper positive 
T yv T ^ 

slope given by ]/o q Its minimum is also zero when 0 =0, but its 

V vv yv 

maximum of 0 /o p , when 0 =00 is greater than Y_'s maximum, 
y V vv yv y V ^ T 

Consequently, Y^ > Y^ with equality holding only when p^^, = 1 , as can also be 

seen from an inspection 3f the formulas for Y^ and Y^. Like the Tucker 

method, the congeneric method has the desirable property of giving less weight 

to anchor group differences as 0^^ decreases. However, the congeneric method, 

like the Levine method, assumes that P(Ty,T^; = 1 or equivalently that 

0 = a a (p ,p . The above analysis reveals that the congeneric 

yv y V yy vv ^ 

method, in contrast to the Levine method, performs reasonably when this 
assumption is violated. 
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The previous analysis has focused on the exclusive anchor situation. It 
can be shown that the T parameters for all three methods in the inclusive 
anchor situation equal their respective exclusive situation Y's plus unity 
(Woodruff, 1986). Hence, uhe above results for the exclusive anchor situation 
apply to the inclusive anchor situation with only slight uiodif ication which 
does not alter comparative performance between the three procedures. 

Illustration 

The previous analysis is illustrated under the exclusive anchor situation 
for four different test administrations in Figures 1 through ^. Though the 
data are real, the exact details of the present application are not reflective 
of the actual equating situations and are illustrative only. 



Insert Figures 1 through H about here 



To facilitate comparisons among the graphs, gamma has been rescaled by the 
multiplication of 1 = s^s^/s^s^ so that each graph has its horizontal axis on 
the scale of r^^ from n to U At the bottom of the figures are the values of 
the statistics from which the graphs were derived. The reliabilities are 
alpha coefficients. For both groups, the number of test items and anchor 
items for Figures 1 through are, respectively, (295, 105), (190, 60), (55, 
20), and (32, 13), while the number of examinees in group 1 and group 2 are, 
respectively, (326, 305), (7^8, 1625), (700, ^093), and (1111, ^093). The 
statistics are for the old test Y administred in group 2. 

The figures are presented in order of test length with the longest and 
most reliable test presented in Figure 1 and the shortest and least reliable 
test presented in Figure ^. As a consequence, the figures are similarly 
ordered by the actual sample value of the correlation between the test and its 

9 
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anchor, r^^, as can be seen from the vertical dashed line in each graph. The 
graphs indicate that as the reliability of the anchor decreases the 
discrepancy between the congeneric and Tucker gammas increases as their 
for:r?nlas indicate. 

For Figures 1 and 2 the disattenuated correlation between the test and 
its anchor is between .99 and 1.01; so, fo^' these figures, the actual sample 
correlation indicated by the ver tical dashed line is the maximum attainable 
Y-V correlation given the unreliability of the measures. In these figures, 
the congeneric and Levine plots intersect at the sample value of the Y-V 
correlation which is appropriate since both methods assume unity for the value 
of the disattenuated Y-V correlation. In Figure 3» the intersection occurs to 
the right of the correlation, while in Figure ^ it is slightly to the left. 
The disattenuated correlation for the data in Figure 3 is .92. The 
disattenuated correlation for the data in Figure ^ is 1.02. Figure 3 will be 
discussed in the next section since it so clearly demonstrates the centrr'' 
point of thin paper. Conversely, Figure H suggests a limitation. Here, where 
the 13~item anchor is quite short, it is probable that the anchor's 
relia'Dility is slightly under-estimated with the result that the disattenuated 
Y-V correlation and the congeneric gamma are slightly over-estimated. 

Alpha coefficients were used in the estimation of the disattenuated test- 
anchor correlations. These were judged to be appropriate reliability 
estimates for the tests used here and for the illustrative nature of this 
paper. Careful consideration is necessary for selecting an appropriate 
reliability index to use in est^'mating disattenuated correlations and gamma 
under the congeneric method. This topic is discussed by Lord and Novick 
(1968, sec. 6.5). 
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Discussion 

The preceding analysis offers an explanation for the empirical results of 
Klein and Jarjoura (1985), and it also has implications for the applica^tion of 
these equating methods. If the groups differ greatly as evidenced by their 
performance on the anchor, and as a consequence application of the Tucker 
method is untenable, then before applying the Levine method the disattenuated 
correlation between Y and V should be computed. If this disattenuated 
correlation is significantly less than unity, then the Levine method should 
also not be used. An appealing alternative is the congeneric method since it 
permits large group differences and performs reasonably when p(T ,T^) < U 
This situation is illustrated in Figure 3. Here Y. is about 2.^ and Y^ is 
about 2.1. "he congeneric method gives about 12.5^ less weight to the anchor 
information on mean differences and about 23% less weight to the information 
on anchor differences in variances (garmna is squared when applied to 
variances). This reduction seems appropriate since the disattenuated Y-V 
correlation is only .92 suggesting that the anchor may not be a perfect 
representation of the test. 

The p^^esent analysis which has lead to the above conclusion is based on a 
comparison of parameter values. In practice, these parameters will have to be 
estimated from sample statistics as was illustrated in the four examples. 
This does not compromise the above conclusion, however, since in all practical 
applications of equating there is at least several hundred examinees in each 
group and more usually many :housand. The parameter estimates will be derived 
from sample first and second order moments and first cder cross oroduct 
sample moments. Hence, the sample statistics will be consistent estimators of 
the parameters and the large sample sizes met with in practice will insure 
that decisions based on the sample values are reasonably accurate. 
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FIGURE 1 

Plot of Gammas for SD(Y)=32.837, SD(V)=1 2.691 , and REL(V)=.861 76. 
The vertical dashed line indicates the actual value of COR(Y,V). 
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FIGURE 2 

Plot of Gonnnnos for SD(Y)- i 6.497. SD(V)=5.9077, and REL(V)=. 70032. 
The vertical dashed line indicates the actual value of COR(Y,V). 
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0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 

COR(Y,V) 

* * • Congeneric ^ ^ * Levine o o o Tucker 

FIGURE 3 

Ploi of Gonnnnos for SD(Y) = 4.9067. SD(V)=2.5022. end REL(V)=.53l 38. 
The vertical dashed line Indicates the actual value of C0R(Y,V). 
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FIGURE 4 

Plot of Gannnnas for SD(Y) = 3.4624. SD(V)=1 .71 60, end REL(V)=. 35030. 
The verficol doshed line indicofes the octuol volue of COR(Y,V). 
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